Back

Journal of Bioinformatics and Systems Biology

Fortune Journals

All preprints, ranked by how well they match Journal of Bioinformatics and Systems Biology's content profile, based on 14 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
The efficiency for recombineering is dependent on the source of the phage recombinase function unit

Chang, Y.; Wang, Q.; Su, T.; Qi, Q.

2019-08-24 bioinformatics 10.1101/745448 medRxiv
Top 0.1%
12.7%
Show abstract

Phage recombinase function units (PRFUs) such as lambda-Red or Rac RecET have been proven to be powerful genetic tools in the recombineering of Escherichia coli. Studies have focused on developing such systems in other bacteria as it is believed that these PRFUs have limited efficiency in distant species. However, how the species evolution distance relates to the efficiency of recombineering remains unclear. Here, we present a thorough study of PRFUs to find features that might be related to the efficiency of PRFUs for recombineering. We first identified 59 unique sets of PRFUs in the genus Corynebacterium and classified them based on their sequence as well as secondary structure similarities. Then both PRFUs from this genus and other bacteria were chosen for experiment based on sequential and secondary structure similarity as well as species distance. These PRFUs were compared for their ability in mediating recombineering with oligo or double-stranded DNA substrates in Corynebacterium glutamicum. We demonstrate that the source of the PRFU is more critical than species distance for the efficiency of recombineering. Our work will provide new ideas for efficient recombineering using PRFUs.\n\nImportanceRecombineering using phage recombinase function units (PRFUs) such as lambda-Red or Rac RecET has gained success in Escherichia coli, while efforts applying these systems in other bacteria were limited by the efficiency. It is believed that the species distance may be a major reason for the low efficiency. In this study, however, we showed that it is the source of PRFU rather than the species distance that matters for the recombineering in Corynebacterium glutamicum. Besides, we also showed that the lower transformation efficiency in other bacteria compared to that of E. coli could be a major reason for the low performance of heterogeneously expressed RecET. These findings will be helpful for the recombineering using PRFUs.

2
Estimating gross transcription rates from RNA level fluctuation data and the effects of sampling time intervals

Xu, Z.; Asakawa, S.

2023-05-24 bioinformatics 10.1101/2023.05.24.541915 medRxiv
Top 0.1%
6.9%
Show abstract

Transcription rates are key biological parameters, but the estimation of transcription rates from RNA level fluctuation data by current methods is still problematic, considering in particular the derived relationship between RNA fragments from different samples and the neglect of the effects of sampling time intervals. Based on defining the gross transcription rate as the amount of converted complete nascent RNA divided by time, the present study developed an algorithm that calculated the cumulative transcription amount and RNA abundance at each time point by simulating moving windows to estimate gross transcription rates from RNA level fluctuation data and explore the effects of sampling time intervals on the estimation. The results showed that the gross transcription rates could be calculated from RNA level fluctuation data with the models fitting the experimental data well. In the analysis of 384 yeast genes, the genes with the highest gross transcription rates mainly played roles in cell division regulation and DNA replication, and the gene utilizing the most cellular resources for gene expression during the experiment was YNR016c, whose main functions are fatty acid biosynthesis and transporting proteins into the nucleus. The shapes of the RNA level curves affected the estimation of gross transcription rates, and the crests and valleys of the RNA level curves responded to higher gross transcription rates. Different scenarios of sampling time intervals could change the shapes of the RNA level curves, resulting in different estimation values of gross transcription rates. Given the potential applications of the present method, further improvements are expected.

3
An algorithm and application to efficiently analyze DNA fiber data

Kirilov, T.; Gospodinov, A.; Kirilov, K.

2021-12-30 bioinformatics 10.1101/2021.12.29.474465 medRxiv
Top 0.1%
6.5%
Show abstract

The duplication of genetic information (DNA replication) is central to life. Numerous control mechanisms ensure the exact course of the process during each cell division. Disturbances of DNA replication have severe consequences for the affected cell, and current models link them to cancer development. One of the most accurate methods for studying DNA replication is labeling newly synthesized DNA molecules with halogenated nucleotides, followed by immunofluorescence and microscopy detection, known as DNA fiber labeling. The method allows the registration of the activity of single replication complexes by measuring the length of the "trace" left by each of them. The major difficulty of the method is the labor-intensive analysis, which requires measuring the lengths of a large number of labeled fragments. Recently, the interest in this kind of image analysis has grown rapidly. In this manuscript, we provide a detailed description of an algorithm and a lightweight Java application to automatically analyze single DNA molecule images we call "DNA size finder". DNA size finder significantly simplified the analysis of the experimental data while increasing reliability by the standardized measurement of a greater number of DNA molecules. It is freely available and does not require any paid platforms or services to be used. We hope that the application will facilitate both the study of DNA replication control and the effects of various compounds used in human activity on the process of DNA replication.

4
A Statistical Detector for Ribosomal Frameshifts and Dual Encodings based on Ribosome Profiling

Yurovsky, A.; Gardin, J.; Futcher, B.; Skiena, S.

2022-06-06 bioinformatics 10.1101/2022.06.06.495024 medRxiv
Top 0.1%
5.1%
Show abstract

During protein synthesis, the ribosome shifts along the messenger RNA (mRNA) by exactly three nucleotides for each amino acid added to the protein being translated. However, in special cases, the sequence of the mRNA somehow induces the ribosome to shift forward by either two or four nucleotides. This shifts the "reading frame" in which the mRNA is translated, and gives rise to an otherwise unexpected protein. Such "programmed frameshifts" are well-known in viruses, including coronavirus, and a few cases of programmed frameshifting are also known in cellular genes. However, there is no good way, either experimental or informatic, to identify novel cases of programmed frameshifting. Thus it is possible that substantial numbers of cellular proteins generated by programmed frameshifting in human and other organisms remain unknown. Here, we build on prior work observing that data from ribosome profiling can be analyzed for anomalies in mRNA reading frame periodicity to identify putative programmed frameshifts. We develop a statistical framework to identify all likely (even for very low frameshifting rates) frameshift positions in a genome. We also develop a frameshift simulator for ribosome profiling data to verify our algorithm. We show high sensitivity of prediction on the simulated data, retrieving 97.4% of the simulated frameshifts. Furthermore, our method found all three of the known yeast genes with programmed frameshifts. We list several hundred yeast genes that may contain +1 or -1 frameshifts. Our results suggest there could be a large number of un-annotated alternative proteins in the yeast genome generated by programmed frameshifting. This motivates further study and parallel investigations in the human genome. Frameshift Detector algorithms and instructions can be accessed in Github: https://github.com/ayurovsky/Frame-Shift-Detector.

5
A Unified and Interpretable Framework for Evaluating Fluorescence Trace Quality in Transcription Kinetics

Xing, Y.; Lu, W.-T.; Liu, J.; Zou, Z.; Zhou, R.; Wang, H.; Yang, Y.; Yao, Y.; Yang, Q.; Xu, X.; Zhou, H.

2026-01-08 bioinformatics 10.64898/2026.01.07.698175 medRxiv
Top 0.1%
5.0%
Show abstract

Quantifying transcriptional dynamics from fluorescence traces is a powerful approach to understanding gene regulation, but such analysis critically depends on the quality of the fluorescence signal. Experimental researchers often lack an objective and computationally simple way to assess trace quality before kinetic modeling. In this study, we fill in this gap via systematically investigating two key factors (i.e., signal-to-noise ratio (SNR) and trace length) using synthetic data generated from a composite-state Hidden Markov Model (cpHMM) simulator. By analyzing thousands of simulated traces, we identified quantitative thresholds (SNR [≥] 30 dB and length [≥] 360) beyond which transcriptional dynamics can be reliably captured for kinetic inference. Building on these findings, we further discovered a unified and easily computable quality indicator based on the difference between the first two autocorrelation lags. A threshold value of approximately 0.07 effectively separates reliable from low quality traces, providing a simple yet robust criterion for data selection. Together, these results establish a practical framework for assessing fluorescence trace reliability, offering experimental researchers an interpretable and computationally efficient tool to ensure data quality prior to transcription kinetics modeling.

6
Random-effect based test for multinomial logistic regression: choice of the reference level and its impact on the testing

He, Q.; Liu, Y.; Liu, M.; Wu, M.; Hsu, L.

2021-04-19 genetic and genomic medicine 10.1101/2021.04.13.21255272 medRxiv
Top 0.1%
4.9%
Show abstract

Random-effect score test has become an important tool for studying the association between a set of genetic variants and a disease outcome. While a number of random-effect score test approaches have been proposed in the literature, similar approaches for multinomial logistic regression have received less attention. In a recent effort to develop random-effect score test for multinomial logistic regression, we made the observation that such a test is not invariant to the choice of the reference level. This is intriguing because binary logistic regression is well-known to possess the invariance property with respect to the reference level. Here, we investigate why the multinomial logistic regression is not invariant to the reference level, and derive analytic forms to study how the choice of the reference level influences the power. Then we consider several potential procedures that are invariant to the reference level, and compare their performance through numerical studies. Our work provides valuable insights into the properties of multinomial logistic regression with respect to random-effect score test, and adds a useful tool for studying the genetic heterogeneity of complex diseases.

7
PtRNAdb: A web resource of Plant tRNA genes from a wide range of plant species

Das, D.; Zahra, S.; Singh, A.; Kumar, S.

2022-03-04 bioinformatics 10.1101/2022.03.03.482782 medRxiv
Top 0.1%
4.9%
Show abstract

tRNA, as well as their derived products such as short interspersed nuclear elements (SINEs), pseudogenes and transfer-RNA, derived fragments (tRFs) has now been shown to be vital for cellular life, functioning and adaptation during different stress conditions in all diverse life forms. In this study, we have developed PtRNAdb (www.nipgr.ac.in/PtRNAdb), a plant exclusive tRNA database containing 113849 tRNA gene sequences from phylogenetically diverse plant species. We have analysed a total of 106 nuclear, 89 plastidial and 38 mitochondrial genomes of plants by tRNAscan-SE software package, and after careful curation of the output data, we developed this database and integrated the data. The information about the tRNA gene sequences obtained, were further enriched with consensus sequence based study of tRNA genes based on their isoacceptors and isodecoders. We have also built covariance models based on the isoacceptors and isodecoders of all the tRNA sequences using infernal tool. The user can also perform BLAST not only against PtRNAdb entries but also against all the tRNA sequences stored in PlantRNA databases; and annotated tRNA genes across the plant kingdom available at NCBI. For the users ease, we have also incorporated the tRNAscan-SE tool for tRNA gene prediction, and ViennaRNA package for structural analysis on the home page of PtRNAdb. This resource is believed to be of high utility for plant researchers as well as molecular biologists to carry out further exploration of plant tRNAome on a wider spectrum, as well as for performing comparative and evolutionary studies related to tRNAs and their derivatives across all domains of life. Database URLhttp://www.nipgr.ac.in/PtRNAdb/

8
Genome-mining algorithm to identify identical repetitive sequences for sensitive and specific diagnostic assays for infectious diseases

Rajeswari, K.; Poojary, R.; Padiwal, S.; Krishna, R. M.; Satyamoorthy, K.; Paul, B.

2024-12-22 bioinformatics 10.1101/2024.12.20.629856 medRxiv
Top 0.1%
4.9%
Show abstract

Nucleic acid amplification-based approaches are extensively used as the first line of choice for infectious diseases. However, the success rates of DNA amplification or hybridization techniques are highly dependent on short primer or probe sequences. A pair of primers that can bind at multiple loci across the genome and randomly amplify multiple copies increases the analytical sensitivity of the currently used diagnostic assays. Herein, we developed a novel genome mining algorithm to identify short identical repeat sequences (IRSs) dispersed across the genome, which can amplify multiple nonhomologous regions of variable sizes via three potential priming combinations. Using this algorithm, we analysed the genomes of five pathogens, namely, gammaherpesvirus, vaccinia virus, Mycobacterium tuberculosis, Plasmodium falciparum, and Phytophthora palmivora, and identified short identical sequences that were repeated at multiple loci. In silico PCR revealed that these identical repeat sequences can amplify multiple copies with different amplicon sizes in these five species. We further performed a polymerase chain reaction assay with short identical repeat pairs identified from M. tuberculosis. Very interestingly, the amplification yielded multiple copies for individual IRSs and even more copies, as in a pair of IRSs. These results indicate that the IRS-based approach can detect pathogens during disease progression in the case of low-concentration DNA. The genome mining algorithm can be used as a translation technology platform for developing highly sensitive varieties of PCR, microarray, loop-mediated isothermal amplification, fluorescence in situ hybridization, and DNA-DNA hybridization-based diagnostic assays.

9
Regulation of cytoplasmic mRNA level by chromatin retention

Henfrey, C.; Murphy, S.; Tellier, M.

2022-11-23 bioinformatics 10.1101/2022.10.24.513557 medRxiv
Top 0.1%
4.8%
Show abstract

Transcription and co-transcriptional processes, including pre-mRNA splicing and mRNA cleavage and polyadenylation, regulate the production of mature mRNAs. The carboxyl terminal domain (CTD) of RNA polymerase (pol) II, which comprises 52 repeats of the Tyr1Ser2Pro3Thr4Ser5Pro6Ser7 peptide, is involved in the coordination of transcription with co-transcriptional processes. The pol II CTD is dynamically modified by protein phosphorylation, which regulates recruitment of transcription and co-transcriptional factors. We have investigated whether cytoplasmic levels of mature mRNA from intron-containing protein-coding genes are related to pol II CTD phosphorylation, RNA stability, and pre-mRNA splicing and mRNA cleavage and polyadenylation efficiency. We find that genes that produce a low level of mature mRNA are associated with relatively high phosphorylation of the pol II CTD Tyr1 and Thr4 residues, poor RNA processing, increased chromatin retention, and shorter RNA half-life. While these poorly-processed transcripts are degraded by the nuclear RNA exosome, our results indicate that in addition to RNA half-life, chromatin retention due to a low RNA processing efficiency also plays an important role in the regulation of cytoplasmic mRNA levels.

10
An improved 3DMax algorithm to reconstruct the three-dimensional structure of the chromosome

Liwei Liu; Huili Yao

2020-07-09 bioinformatics 10.1101/2020.07.09.195693 medRxiv
Top 0.1%
4.8%
Show abstract

In recent years, with the development of high-throughput chromosome conformation capture (Hi-C) technology and the reduction of high-throughput sequencing cost, the data volume of whole-genome interaction has increased rapidly, and the resolution of interaction map keeps improving. Great progress has been made in the research of 3D structure modeling of chromosomes and genomes. Several methods have been proposed to construct the chromosome structure from chromosome conformation capture data. Based on the Hi-C data, this paper analyses the relevant literature of chromosome 3D structure reconstruction and it summarizes the principle of 3DMAX, which is a classical algorithm to construct the 3D structure of a chromosome. In this paper, we introduce a new gradient ascent optimization algorithm called XNadam that is a variant of Nadam optimization method. When XNadam is applied to 3DMax algorithm, the performance of 3DMax algorithm can be improved, which can be used to predict the three-dimensional structure of a chromosome.Author summary The exploration of the three-dimensional structure of chromosomes has gradually become a necessary means to understand the relationship between genome function and gene regulation. An important problem in the construction of three-dimensional model is how to use the interaction map. Usually, the interaction frequency can be transformed into the spatial distance according to the deterministic or non-deterministic function relationship, and the interaction frequency can be weighted as weight in the objective function of the optimization problem. When the frequency of interaction is weighted as weight in the objective function of the optimization problem, what kind of optimization method is used to optimize the objective function is the problem we consider. In order to solve this problem, we provide an improved stochastic gradient ascent optimization algorithm(XNadam). The XNadam optimization algorithm combined with maximum likelihood algorithm is applied to high resolution Hi-C data set to infer 3D chromosome structure.View Full Text

11
Evidence for a long-r ange RNA-RNA interaction between ORF8 and the downstream region of the Spike polybasic insertion of SARS-CoV-2

Manzourolajdad, A.; Pereira, F.

2021-11-09 bioinformatics 10.1101/2021.11.09.467911 medRxiv
Top 0.1%
4.4%
Show abstract

SARS-CoV-2 has affected people worldwide as the causative agent of COVID-19. The virus is related to the highly lethal SARS-CoV responsible for the 2002-2003 SARS outbreak in Asia. Research is ongoing to understand why both viruses have different spreading capacities and mortality rates. Like other beta coronaviruses, RNA-RNA interactions occur between different parts of the viral genomic RNA, resulting in discontinuous transcription and production of various sub-genomic RNAs. These sub-genomic RNAs are then translated into other viral proteins. In this work, we performed a comparative analysis for novel long-range RNA-RNA interactions that may involve the Spike region. Comparing predictions between reference sequences of SARS-CoV-1 and SARS-CoV-2 revealed several predictions amongst which a thermodynamically stable long-range RNA-RNA interaction between (23660-23703 Spike) and (28025-28060 ORF8) unique to SARS-CoV-2 was observed. Using data gathered worldwide, sequence variation patterns observed in the population support the in-silico RNA-RNA base-pairing predictions within these regions, suggesting further evidence for the interaction. The predicted interactions can potentially be related to the regulation of sub-genomic RNA production rates in SARS-CoV-2 and their subsequent accessibility to the host transcriptome.

12
In silico prediction of COVID-19 test efficiency with DinoKnot

Newman, T.; Chang, H. F. K.; Jabbari, H.

2020-09-11 bioinformatics 10.1101/2020.09.11.292730 medRxiv
Top 0.1%
4.3%
Show abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a novel coronavirus spreading across the world causing the disease COVID-19. The diagnosis of COVID-19 is done by quantitative reverse-transcription polymer chain reaction (qRT-PCR) testing which utilizes different primer-probe sets depending on the assay used. Using in silico analysis we aimed to determine how the secondary structure of the SARS-CoV-2 RNA genome affects the interaction between the reverse primer during qRT-PCR and how it relates to the experimental primer-probe test efficiencies. We introduce the program DinoKnot (Duplex Interaction of Nucleic acids with pseudoKnots) that follows the hierarchical folding hypothesis to predict the secondary structure of two interacting nucleic acid strands (DNA/RNA) of similar or different type. DinoKnot is the first program that utilizes stable stems in both strands as a guide to find the structure of their interaction. Using DinoKnot we predicted the interaction of the reverse primers used in four common COVID-19 qRT-PCR tests with the SARS-CoV-2 RNA genome. In addition, we predicted how 12 mutations in the primer/probe binding region may affect the primer/probe ability and subsequent SARS-CoV-2 detection. While we found all reverse primers are capable of interacting with their target area, we identified partial mismatching between the SARS-CoV-2 genome and some reverse primers. We predicted three mutations that may prevent primer binding, reducing the ability for SARS-CoV-2 detection. We believe our contributions can aid in the design of a more sensitive SARS-CoV-2 test. Author summaryThe current testing for the disease COVID-19 that is caused by the novel cornonavirus SARS-CoV-2 uses oligonucleotides called primers that bind to specific target regions on the SARS-CoV-2 genome to detect the virus. Our goal was to use computational tools to predict how the structure of the SARS-CoV-2 RNA genome affects the ability of the primers to bind to their target region. We introduce the program DinoKnot (Duplex interaction of nucleic acids with pseudoknots) that is able to predict the interactions between two DNA or RNA molecules. We used DinoKnot to predict the efficiency of four common COVID-19 tests, and the effect of mutations in the SARS-CoV-2 virus on ability of the COVID-19 tests in detecting those strains. We predict partial mismatching between some primers and the SARS-CoV-2 genome but that all primers are capable of interacting with their target areas. We also predict three mutations that prevent primer binding and thus SARS-CoV-2 detection. We discuss the limitations of the current COVID-19 testing and suggest the design of a more sensitive COVID-19 test that can be aided by our findings.

13
Jaccard Index Network Analysis for pangenome analysis

Penil-Celis, A.; Redondo-Salvo, S.; Tagg, K. A.; Webb, H. E. E.; Garcillan-Barcia, M. P.; de la Cruz, F.

2025-08-24 bioinformatics 10.1101/2025.08.20.669834 medRxiv
Top 0.1%
4.3%
Show abstract

ii.Summary/AbstractJaccard Index Network Analysis (JINA) is a comprehensive workflow designed to explore bacterial genome relationships through an integrated network-based approach. This workflow combines existing tools such as Jaccard Index (1), Gephi (2) and Pangraph (3). By integrating these methodologies into a unified framework, JINA enables efficient visualization and stratification of genomic data, facilitating the identification of meaningful patterns, groups, and associations within bacterial populations. The use of JINA ensures precision in capturing genomic variation including single nucleotide polymorphisms, insertions and deletions. While JINA does not implement Gephi, BLAST, and PanGraph directly in a single software, it guides their coordinated use to analyze and interpret genomic data effectively.

14
Stochastic model of BKPy Virus replication and assembly

Stiegelmeyer, S. M.; Jeffers-Francis, L. K.; Giddings, M. C.; Webster-Cyriaque, J.

2019-08-24 bioinformatics 10.1101/746149 medRxiv
Top 0.1%
4.2%
Show abstract

BK Polyomavirus (BKPyV), belongs to the same family as SV40 and JC Virus and has recently been associated with both Sjogrens Syndrome and HIV associated Salivary Gland Disease. BKPyV was previously only known for causing the rejection of kidney transplants. As such, BKPyV infection of salivary gland cells implicates oral transmission of the virus. BKPyV replicates slowly in salivary gland cells, producing infectious virus after 72-96 hours. However, it remains unclear how this virus infects or replicates within salivary gland cells, blocking the development of therapeutic strategies to inhibit the virus. Thus, an intracellular, computational model using agent-based modeling was developed to model BKPyV replication within a salivary gland cell. In addition to viral proteins, we modeled host cell machinery that aids transcription, translation and replication of BKPyV. The model has separate cytosolic and nuclear compartments, and represents all large molecules such as proteins, RNAs, and DNA as individual computer \"agents\" that move and interact within the simulated salivary gland cell environment. An application of the Boids algorithm was implemented to simulate molecular binding and formation of BKPyV virions and BKPyV virus-like particles (VLPs). This approach enables the direct study of spatially complex processes such as BKPyV virus self-assembly, transcription, and translation. This model reinforces experimental results implicating the processes that result in the slow accumulation of viral proteins. It revealed that the slow BKPyV replication rate in salivary gland cells might be explained by capsid subunit accumulation rates. BKPyV particles may only form after large concentrations of capsid subunits have accumulated. In addition, salivary gland specific transcription factors may enable early region transcription of BKPyV.

15
The effect of removing repeat-induced overlaps in de novo assembly

Shiarli Hossein Zade, R.; Abeel, T.

2023-04-18 bioinformatics 10.1101/2023.04.16.537101 medRxiv
Top 0.1%
4.1%
Show abstract

Determining accurate genotypes is important for associating phenotypes to genotypes. De novo genome assembly is a critical step to determine the complete genotype for species for which no reference exists yet. The main challenge of de novo eukaryote genome assembly, particularly plant genomes, are repetitive DNA sequences within their genomes. The introduction of third generation sequencing and corresponding long reads has promised to resolve repeat-related problems. While there have been notable improvements, reads originating from these repeats are still creating errors because they introduce false overlaps in the assembly graph. This study focuses on analyzing the effect of repeats on de novo assembly and improving performance of existing de novo assembly algorithms by removing repeat-induced overlaps. First, we show the possible improvements in de novo assembly with removing repeat-induced overlaps. Then we propose several methods for detecting and removing repeat-induced overlaps and evaluate their performance on several simulated datasets.

16
Quickly and simply detection for coronaviruses including SARS-CoV-2 on the mobile Real-Time PCR without treating RNA in advance

Muraoka, M.; Tanoi, Y.; Tada, T.; Mizukoshi, M.; Kawaguchi, O.

2020-11-05 genetic and genomic medicine 10.1101/2020.08.06.20168294 medRxiv
Top 0.1%
4.1%
Show abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was reported to WHO as an outbreak in Wuhan City, Hubei Province, China on end of 2019, afterwards epidemic in many countries, and pandemic on the worldwide in 2020. Usually detection of coronavirus including SARS-CoV-2 was detected by real-time RT-PCR method, but it must be long time that RNA is treated by extraction, concentration and purification, and detected by RT-PCR method. We modified various methods, of which evaluated if each method is short and simple enough. In one point of the evaluations, real-time RT-PCR could be finished in very short time with using mobile real-time PCR device PCR1100 (Nippon Sheet Glass Co. Ltd.). It was able to detect positive control RNA for 20 minutes by each method according to the National Institute of Infections Disease in Japan (NIID), and less than 13.5 minutes according to the Centers for Disease Control and Prevention in USA (CDC). In another point of the evaluations, surprisingly, Human coronavirus 229E, which was substituted for SARS-CoV-2, could be detected in crude state without treatment in advance of RNA. As that was, it was possible to detect coronavirus with direct RT-PCR. Therefore, it might eliminate wasteful time, avoid secondary infection and risk of contamination. In light of the above two points, SARS-CoV-2 might be detected more quickly and more simply. With using this mobile real-time PCR, these methods should be suitable for not only SARS-CoV-2 but also other various viruses and might save time compared to earlier detection methods.

17
Systematic functional annotation workflow for insects

Bono, H.; Sakamoto, T.; Kasukawa, T.; Tabunoki, H.

2022-06-11 bioinformatics 10.1101/2022.05.12.490705 medRxiv
Top 0.1%
4.0%
Show abstract

Next generation sequencing has revolutionized entomological study, rendering it possible to analyze the genomes and transcriptomes of non-model insects. However, use of this technology is often limited to obtaining nucleotide sequences of target or related genes, with many of the acquired sequences remaining unused because other available sequences are not sufficiently annotated. To address this issue, we have developed a functional annotation workflow for transcriptome-sequenced insects to determine transcript descriptions, which represents a significant improvement over the previous method (functional annotation pipeline for insects). The developed workflow attempts to annotate not only the protein sequences obtained from transcriptome analysis but also the ncRNA sequences obtained simultaneously. In addition, the workflow integrates the expression level information obtained from transcriptome sequencing for application as functional annotation information. Using the workflow, functional annotation was performed on the sequences obtained from transcriptome sequencing of stick insect (Entoria okinawaensis) and silkworm (Bombyx mori), yielding richer functional annotation information than that obtained in our previous study. The improved workflow allows more comprehensive exploitation of transcriptome data and is applicable to other insects because the workflow has been openly developed on GitHub. Simple SummaryThe function of all genes encoded in the genome should be studied for genome editing. The genome editing technology can speeds up insect research for functional analysis of genes. Our knowledge about the functional information of genes is still incomplete currently while genome sequencing of an organism can be completed. The functional information has been annotated based solely on the information that has been obtained from the result of previous biological research. However, this information will be important in determining the target genes for genome editing. In particular, it is very important that this information is in machine-readable form because computer programs mainly parse this information for the understanding of biological systems. In this paper, we describe a workflow-based method for annotating gene functions in insects that make use of transcribed sequence information as well as reference genome and protein sequence databases. Using the developed workflow, we annotated functional information of Japanese stick insect and silkworm, including gene expression as well as sequence analysis. The functional annotation information obtained by the workflow will greatly expand the possibilities of entomological research using genome editing.

18
Categorization of prophage genes in Bacillus subtilis 168 and assessing their relative importance through RNA-seq gene expression analysis

Ng, W.

2021-10-28 bioinformatics 10.1101/2021.10.26.466030 medRxiv
Top 0.1%
4.0%
Show abstract

Bacteriophage evolves to control the population of fast-growing bacterial cells, without which explosion in bacterial population may induce unimaginable harm to diverse ecosystems. But, bacteriophage also "hide" in bacterial genomes when nutritional and environmental circumstances are unfavourable. This involves the integration of phage genome into the host genome at appropriate genomic loci in a process known as lysogeny. This work sought to delineate the prophages present in the annotated genome of Bacillus subtilis 168, and assess their relative importance through RNA-seq expression analysis. Firstly, examination of the annotated genome of the model Gram-positive bacterium revealed five distinct prophage regions: SPBeta, prophage 6, PBSX, prophage 3 region, and prophage 1 region. All prophage regions contain host genes, which suggests that host transposase activity have swapped in host genes for phage genes in the prophage genome. Given the significant number of phage genes that have been swapped into each of the prophage genome, all prophage regions are deemed to be defective. BLAST analysis further highlighted that many of the prophages in B. subtilis are extinct given that they do not have ancestral or daughter brethren. However, RNA-seq transcriptome analysis of B. subtilis turned out an interesting paradox indicative of the important role that host transposase have in swapping in host promoters for prophage genes. Specifically, a significant number of prophage genes are highly expressed, which is implausible given that phage genes should be transcriptionally silent. The result and phenomenon further suggests the relative facile nature in which host promoters could be swapped in for phage genes, which is indicative of presence of genomic motifs in prophage genome recognizable by host transposase. Existence of such sequence motifs is thus indicative of possible co-evolution of transposase and phages where transposases were originally a part of the phage genome, which latter "jumped out" into the host genome to aid the swapping in of host genes into the prophage genome for augmenting prophage genetic repertoire in the face of changing environmental conditions. Overall, it is not uncommon for bacterial species to harbour multiple prophages. But, lysogeny may not be a viable option for long-term preservation of prophage genetic repertoire given that host transposase would inevitable swap in host genes at random locations in the prophage genome.

19
Weighted Off-target and Efficiency Scoring Reveal Genome Composition-Dependent Optimal CRISPR/Cas9 Guide Design

Krishna Y K, Y.

2025-08-13 bioinformatics 10.1101/2025.08.13.670047 medRxiv
Top 0.1%
4.0%
Show abstract

The efficiency and specificity of guide RNAs continue to be crucial obstacles for successful experimental design, despite the fact that CRISPR/Cas9 has transformed genome editing. In this work, we introduce a computational method for optimizing CRISPR/Cas9 guide RNA that combines PAM diversity, local efficiency penalties, and weighted off-target scoring to find high-performing guides across a range of genome compositions. To capture a variety of natural genomic complexity, we simulated five sample genomes: AT-rich, GC-rich, balanced GC content, and high-repeat variations. All twenty-nucleotide target sequences were scanned for each genome, and off-target potential was assessed by permitting up to two mismatches with weighted penalties for seed region sites. To accommodate for any secondary structure impacts, efficiency assessment included both local sliding window penalties and global GC content. Furthermore, we looked at several PAM sequences that were pertinent to various Cas9 variations in order to assess how they affected guide selection. The findings show that efficiency scores vary by genome composition, with the highest scoring guides consistently displaying zero anticipated off-target events. While balanced genomes showed intermediate tendencies, GC-rich genomes tended to choose slightly higher efficiency guides than AT-rich genomes. PAM type affects guide efficiency, according to analysis across several genomes, and the combination of efficiency and off-target score consistently indicates guides with good expected performance. Three-dimensional scatter plots of efficiency and off-target counts versus genomic position, violin plots of off-target distributions, and genome-wide heatmaps emphasizing the best guide positions were used to illustrate these findings. In addition to offering a generalizable computational method for choosing CRISPR/Cas9 guides that optimize specificity and efficiency, our study gives fresh insights into the interactions among genome composition, PAM selection, and guide design criteria. By taking into account weighted off-target penalties, genome complexity, and local efficiency effects, this in silico framework overcomes some of the main drawbacks of earlier simulations. It is also easily applicable to direct selection for experimental research on a variety of organisms. The results provide the groundwork for future advancements in genome editing techniques by establishing a predictive computational framework that can expedite CRISPR/Cas9 research and minimize trial and error in guide selection.

20
Predicting Gene Expression from DNA Sequence using Residual Neural Network

Zhang, Y.; Zhou, X.; Cai, X.

2020-06-22 bioinformatics 10.1101/2020.06.21.163956 medRxiv
Top 0.1%
4.0%
Show abstract

It is known that cis-acting DNA motifs play an important role in regulating gene expression. The genome in a cell thus contains the information that not only encodes for the synthesis of proteins but also is necessary for regulating expression of genes. Therefore, the mRNA level of a gene may be predictable from the DNA sequence. Indeed, three deep neural network models were developed recently to predict the mRNA level of a gene directly or indirectly from the DNA sequence around the transcription start side of the gene. In this work, we develop a deep residual network model, named ExpResNet, to predict gene expression directly from DNA sequence. Applying ExpResNet to the GTEx data, we demonstrate that ExpResNet outperforms the three existing models across four tissues tested. Our model may be useful in the investigation of gene regulation.